Self-organization in mixture densities of HMM based speech recognition
نویسنده
چکیده
In this paper experiments are presented to apply Self-Organizing Map (SOM) and Learning Vector Quantization (LVQ) for training mixture density hidden Markov models (HMMs) in automatic speech recognition. The decoding of spoken words into text is made using speaker dependent, but vocabulary and context independent phoneme HMMs. Each HMM has a set of states and the output density of each state is a unique mixture of the Gaussian densities. The mixture densities are trained by segmental versions of SOM and LVQ3. SOM is applied to initialize and smooth the mixture densities and LVQ3 to simply and robustly decrease recognition errors.
منابع مشابه
Mixture trees - hierarchically tied mixture densities for modeling HMM emission probabilities
We propose a novel hierarchical mixture model and present its application to acoustic modeling for HMM based large vocabulary conversational speech recognition. We detail an EM algorithm for estimating the parameters of such a mixture tree for the case of Gaussian component densities. We sketch how clustering algorithms can be applied to automatically construct suitable mixture trees for a larg...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملUsing the self-organizing map to speed up the probability density estimation for speech recognition with mixture density HMMs
This paper presents methods to improve the probability density estimation in hidden Markov models for phoneme recognition by exploiting the Self-Organizing Map (SOM) algorithm. The advantage of using the SOM is based on the created approximative topology between the mixture densities by training the Gaussian mean vectors used as the kernel centers by the SOM algorithm. The topology makes the ne...
متن کاملPrincipal mixture speaker adaptation for improved continuous speech recognition
Nowadays, almost all speaker-independent (SI) speech recognition systems use CDHMM with multivariate mixture Gaussian as observation density to cover speaker variabilities. It has been shown that given sufficient training data, the more mixtures are used in the HMM observation density, the better the system’s perform. However, acoustic HMM with more Gaussian densities is more complex and slows ...
متن کاملOptions for Modelling Temporal Statistical Dependencies in an Acoustic Model for ASR
In this paper we consider the combination of hidden Markov models based on Gaussian mixture densities (GMM-HMM) and linear dynamic models (LDM) as the acoustic model for automatic speech recognition systems. In doing so, the individual strengths of both models, i.e. the modelling of long-term temporal dependencies by the GMM-HMM and the direct modelling of statistical dependencies between conse...
متن کامل